A Word Analysis System for German Hyphenation, Full Text Search, and Spell Checking, with Regard to the Latest Reform of German Orthography
نویسنده
چکیده
In text processing systems German words require special treatment because of the possibility to form compound words as a combination of existing words. To this end, a universal word analysis system will be introduced which allows an analysis of all words in German texts according to their atomic components. A recursive decomposition algorithm, following the rules for word flexion, derivation, and compound generation in the German language, splits words into their smallest relevant parts (= atoms), which are stored in an atom table. The system is based on the foundations described in this article, and is being used for reliable, sense-conveying hyphenation, as well as for sense-conveying full text search, and in limited form also as a spelling checker.
منابع مشابه
A Language-Sensitive Text Editor for Dutch
Modern word processors begin to offer a range of facilities for spelling, grammar and style checking in English. For the Dutch language hardly anything is available as yet. Many commercial word processing packages do include a hyphenation routine and a lexicon-based spelling checker but the practical usefulness of these tools is limited due to certain properties of Dutch orthography, as we will...
متن کاملTesting a Word Analysis System for Reliable and Sense-Conveying Hyphenation and Other Applications
In this article, we present a test environment for a word analysis system that is used for reliable and sense-conveying hyphenation of German words. A crucial task is the hyphenation of compound words, a huge set of those can readily be formed from existing words. Due to this fact, testing and checking all existing words for correct hyphenation is infeasible. Therefore we have developed special...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملSi3Trenn and Si3Silb: Using the SiSiSi Word Analysis System Pre-hyphenation and Syllable Counting in German Documents
We present two applications of a word analysis system for the German language: pre-hyphenation of documents in various formats, and counting the syllables of all words of a document. The Si3Trenn preprocessor provides pre-hyphenation for file formats allowing for soft hyphens (currently: plain text, LTEX, RTF). It applies reliable, senseconveying hyphenation (SiSiSi) to each word of the input t...
متن کاملInvestigating lexical competition - An Empirical Case Study of the German Spelling Reform of 1996/2004/2006
The German spelling reform of 1996/2004/2006 triggered the introduction of new or thographic variants in the German spelling system. These were the products of dif ferent kinds of modi cations enacted by the reform. They could be a result of a `mu tation'-like change of some of the characters of a word (as, for example, the change from Biographie to Biogra e), due to a writing as two words o...
متن کامل